Enabling efficient stencil code generation in OpenACC
نویسندگان
چکیده
The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general-purpose approach delivers good performance on average, but it misses optimization opportunities for code generation and execution of specific classes of applications. In this paper, we propose stencil extensions to enable efficient code generation in OpenACC. Our results show that our stencil extensions may improve the performance of OpenACC in up to 28% and 45% on GPU and CPU, respectively.
منابع مشابه
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...
متن کاملHybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model
In this work we use the GPU porting task for the operative Japanese weather prediction model “ASUCA” as an opportunity to examine productivity issues with OpenACC when applied to structured grid problems. We then propose “Hybrid Fortran”, an approach that combines the advantages of directive based methods (no rewrite of existing code necessary) with that of stencil DSLs (memory layout is abstra...
متن کاملAutotuning Tensor Contraction Computations on GPUs
We describe a framework for generating optimized GPU code for computing tensor contractions, a multidimensional generalization of matrix-matrix multiplication that arises frequently in computational science applications. Typical performance optimization strategies for such computations transform the tensors into sequences of matrix-matrix multiplications to take advantage of an optimized BLAS l...
متن کاملAutomatic Stencil Code Generation- Ph.D. Thesis Proposal
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and ...
متن کاملGenerating Efficient Parallel Programs for Distributed Memory Systems
Leveraging the performance of distributed and shared memory clusters in scientific computing is challenging in terms of programmability and efficiency. The dimensions of the problem are data distribution, computation distribution, efficient communications and the ease of programming. To address those dimensions in a balanced manner, we present a directive-based programming model for hybrid dist...
متن کامل